In computer science, a lookup table ( LUT) is an array that replaces runtime computation of a mathematical function with a simpler array indexing operation, in a process termed as direct addressing. The savings in processing time can be significant, because retrieving a value from memory is often faster than carrying out an "expensive" computation or input/output operation. The tables may be precomputation and stored in static program storage, calculated (or prefetcher) as part of a program's initialization phase ( memoization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. FPGAs also make extensive use of reconfigurable, hardware-implemented, lookup tables to provide programmable hardware functionality. LUTs differ from hash tables in a way that, to retrieve a value with key , a hash table would store the value in the slot where is a hash function i.e. is used to compute the slot, while in the case of LUT, the value is stored in slot , thus directly addressable.
In ancient (499 AD) India, Aryabhata created one of the first sine tables, which he encoded in a Sanskrit-letter-based number system. In 493 AD, Victorius of Aquitaine wrote a 98-column multiplication table which gave (in Roman numerals) the product of every number from 2 to 50 times and the rows were "a list of numbers starting with one thousand, descending by hundreds to one hundred, then descending by tens to ten, then by ones to one, and then the fractions down to 1/144"Maher, David. W. J. and John F. Makowski. " Literary Evidence for Roman Arithmetic With Fractions", 'Classical Philology' (2001) Vol. 96 No. 4 (2001) pp. 376–399. (See page p.383.) Modern school children are often taught to memorize "times tables" to avoid calculations of the most commonly used numbers (up to 9 x 9 or 12 x 12).
Early in the history of computers, input/output operations were particularly slow – even in comparison to processor speeds of the time. It made sense to reduce expensive read operations by a form of manual caching by creating either static lookup tables (embedded in the program) or dynamic prefetched arrays to contain only the most commonly occurring data items. Despite the introduction of systemwide caching that now automates this process, application level lookup tables can still improve performance for data items that rarely, if ever, change.
Lookup tables were one of the earliest functionalities implemented in computer , with the initial version of VisiCalc (1979) including a LOOKUP function among its original 20 functions. Bill Jelen: "From 1979 – VisiCalc and LOOKUP"!, by MrExcel East, 31 March 2012 This has been followed by subsequent spreadsheets, such as Microsoft Excel, and complemented by specialized VLOOKUP and HLOOKUP functions to simplify lookup in a vertical or horizontal table. In Microsoft Excel the XLOOKUP function has been rolled out starting 28 August 2019.
A simple example of C code, designed to count the 1 bits in a int, might look like this:
int result = 0;
while (x != 0) {
x = x & (x - 1);
result++;
}
return result;
}
The above implementation requires 32 operations for an evaluation of a 32-bit value, which can potentially take several clock cycles due to Control flow. It can be "loop unrolling" into a lookup table which in turn uses trivial hash function for better performance.
The bits array, bits_set with 256 entries is constructed by giving the number of one bits set in each possible byte value (e.g. 0x00 = 0, 0x01 = 1, 0x02 = 1, and so on). Although a runtime algorithm can be used to generate the bits_set array, it's an inefficient usage of clock cycles when the size is taken into consideration, hence a precomputed table is used—although a compile time script could be used to dynamically generate and append the table to the source file. Sum of ones in each byte of the integer can be calculated through trivial hash function lookup on each byte; thus, effectively avoiding branches resulting in considerable improvement in performance.
union four_bytes {
int big_int;
char each_byte[4];
} operand = input_value;
const int bits_set[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4,
2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4,
2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6,
4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5,
3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6,
4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};
return (bits_set[operand.each_byte[0]] + bits_set[operand.each_byte[1]] +
bits_set[operand.each_byte[2]] + bits_set[operand.each_byte[3]]);
In image processing, lookup tables are often called LUTs (or 3DLUT), and give an output value for each of a range of index values. One common LUT, called the colormap or palette, is used to determine the colors and intensity values with which a particular image will be displayed. In computed tomography, "windowing" refers to a related concept for determining how to display the intensity of measured radiation.
Functions of a single variable (such as sine and cosine) may be implemented by a simple array. Functions involving two or more variables require multidimensional array indexing techniques. The latter case may thus employ a two-dimensional array of powerxy to replace a function to calculate xy for a limited range of x and y values. Functions that have more than one result may be implemented with lookup tables that are arrays of structures.
As mentioned, there are intermediate solutions that use tables in combination with a small amount of computation, often using interpolation. Pre-calculation combined with interpolation can produce higher accuracy for values that fall between two precomputed values. This technique requires slightly more time to be performed but can greatly enhance accuracy in applications that require it. Depending on the values being precomputed, precomputation with interpolation can also be used to shrink the lookup table size while maintaining accuracy.
While often effective, employing a lookup table may nevertheless result in a severe penalty if the computation that the LUT replaces is relatively simple. Memory retrieval time and the complexity of memory requirements can increase application operation time and system complexity relative to what would be required by straight formula computation. The possibility of cache pollution may also become a problem. Table accesses for large tables will almost certainly cause a cache miss. This phenomenon is increasingly becoming an issue as processors outpace memory. A similar issue appears in rematerialization, a compiler optimization. In some environments, such as the Java programming language, table lookups can be even more expensive due to mandatory bounds-checking involving an additional comparison and branch for each lookup.
There are two fundamental limitations on when it is possible to construct a lookup table for a required operation. One is the amount of memory that is available: one cannot construct a lookup table larger than the space available for the table, although it is possible to construct disk-based lookup tables at the expense of lookup time. The other is the time required to compute the table values in the first instance; although this usually needs to be done only once, if it takes a prohibitively long time, it may make the use of a lookup table an inappropriate solution. As previously stated however, tables can be statically defined in many cases.
However, this can be expensive to compute, especially on slow processors, and there are many applications, particularly in traditional computer graphics, that need to compute many thousands of sine values every second. A common solution is to initially compute the sine of many evenly distributed values, and then to find the sine of x we choose the sine of the value closest to x through array indexing operation. This will be close to the correct value because sine is a continuous function with a bounded rate of change. For example:
function lookup_sine(x)
sine_table[x] = sine(pi * x / 1000)
return sine_table[round(1000 * x / pi)]
Unfortunately, the table requires quite a bit of space: if IEEE double-precision floating-point numbers are used, over 16,000 bytes would be required. We can use fewer samples, but then our precision will significantly worsen. One good solution is linear interpolation, which draws a line between the two points in the table on either side of the value and locates the answer on that line. This is still quick to compute, and much more accurate for such as the sine function. Here is an example using linear interpolation:
x1 = floor(x * 1000 / pi)
y1 = sine_table[x1]
y2 = sine_table[x1 + 1]
return y1 + (y2 - y1) * (x * 1000 / pi - x1)
Linear interpolation provides for an interpolated function that is continuous, but will not, in general, have continuous . For smoother interpolation of table lookup that is continuous and has continuous first derivative, one should use the cubic Hermite spline.
When using interpolation, the size of the lookup table can be reduced by using nonuniform sampling, which means that where the function is close to straight, we use few sample points, while where it changes value quickly we use more sample points to keep the approximation close to the real curve. For more information, see interpolation.
An n-bit LUT can encode any n-input Boolean function by storing the truth table of the function in the LUT. This is an efficient way of encoding Boolean logic functions, and LUTs with 4-6 bits of input are in fact the key component of modern field-programmable gate arrays (FPGAs) which provide reconfigurable hardware logic capabilities.
In some systems, may also be defined in place of lookup tables for these calculations.
|
|